Protein Engineering, Design and Selection — Latest Matching Preprints

1

Cell-free protein synthesis as a method to rapidly screen machine learning-directed protease variants

Thornton, E. L.; Boyle, J. T.; Laohakunakorn, N.; Regan, L.

2025-01-25 synthetic biology Community evaluation 10.1101/2025.01.24.634768 medRxiv

Top 0.1%

26.3%

Show abstract

Machine learning (ML) tools have revolutionised protein structure prediction, engineering and design, but the best ML tool is only as good as the training data it learns from. To obtain high quality structural or functional data, protein purification is typically required, which is both time and resource consuming - especially at the scale required to train ML tools. Here, we showcase cell-free protein synthesis (CFPS) as a straightforward and fast tool for screening and scoring the activity of protein variants for ML workflows. We demonstrate the utility of the system by improving the kinetic qualities of a protease. By rapidly screening just 48 random variants to initially sample the fitness landscape, followed by 32 more targeted variants, we identified several protease variants with improved kinetic properties.

2

Conformation-specific Design: a New Benchmark and Algorithm with Application to Engineer a Constitutively Active MAP Kinase

Stern, J.; Alharbi, S.; Sandholu, A.; Arold, S. T.; Della Corte, D.

2025-04-24 synthetic biology 10.1101/2025.04.23.650138 medRxiv

Top 0.1%

21.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWA general method for designing proteins with high conformational specificity is desirable for a variety of applications, including enzyme design and drug target redesign. To assess the ability of algorithms to design for conformational specificity, we introduce MotifDiv, a benchmark dataset of 200 conformational specificity design challenges. We also introduce CSDesign, an algorithm for designing proteins with high preference for a target conformation over an alternate conformation. On the MotifDiv benchmark, CSDesign designs protein sequences that are predicted to prefer the target conformation. We apply this method in vitro to redesign human MAP kinase ERK2, an enzyme with active and inactive conformations. Out of two designs for the active conformation, one increased activity sufficiently to retain activity in the absence of activating phosphorylations, a property not present in the wild type protein.

3

A Yeast Surface Display Platform for Screening Dimeric Mammalian Receptors

Slaton, E. W.; Krivanek, E. C.; Kimmel, B. R.

2026-01-30 synthetic biology 10.64898/2026.01.29.702702 medRxiv

Top 0.1%

15.5%

Show abstract

Discovering proteins that modulate receptor activity remains a key challenge in the field of protein design and engineering. Traditionally, identifying proteins that interact with receptors often relies on binding as a selection criterion, yielding limited information about the function of discovered binders in a library, including the ability to activate or block signaling cascades associated with the receptor of interest. As a result, extensive downstream characterization is required to assess the biological relevance of discovered binders. To address this issue, we have developed a high-throughput screening system to screen dimeric mammalian receptors using yeast surface display. We demonstrate the programmed dimerization of the extracellular domains of mammalian receptors in yeast via engineered induction pathways, thereby enabling receptor expression and the secretion of associated native cytokines. This surface expression of the involved subunits for the protein receptor and cytokine-induced dimerization activity indicates that the receptor has been activated and is expected to trigger a DNA-driven signaling cascade within a mammalian cell. This system provides a modular platform technology that advances existing yeast-display systems, demonstrating the effectiveness of these high-throughput platforms for screening the function of mammalian receptors. This work is expected to provide a rapid, cost-effective approach to the molecular discovery of novel biologics for targeting dimeric mammalian receptors.

4

Yeast biopanning for detecting antibody binding to site-specific phosphorylations in tau

Arbaciauskaite, M.; Pirhanov, A.; Lei, Y.; Cho, Y. K.

2022-01-18 bioengineering 10.1101/2022.01.15.476481 medRxiv

Top 0.1%

13.2%

Show abstract

The detection of phosphorylated tau (p-tau) levels in clinical samples is of extreme importance for the detection of Alzheimers Disease (AD) as well as other neurodegenerative diseases. Recent reports show that detecting low levels of p-tau in plasma can be used as a reliable biomarker for detecting AD prior to the onset of memory loss. The ability to detect such low levels of p-tau is dependent on antibodies specific to the post translationally modified protein. However, the need for reliable phospho-site specific antibodies persists due to a lack of approaches for identifying monoclonal antibodies and characterizing non-specific binding. Here, we report a novel approach using the principles of yeast biopanning to create a robust platform that uses synthetic peptides as target antigens. Using peptides as antigens enables screening antibodies against defined post-translational modification sites, particularly for targeting intrinsically disordered proteins such as the human tau protein. To readily assess yeast binding and distinguish non-specific binding, we developed bi-directional expression vectors that allow antibody fragment surface display and intracellular fluorescent protein expression. We show that our platform can specifically and robustly detect a specific site within the p-tau target peptide when compared against non-phosphorylated controls. By improving biopanning parameters, we enabled phospho-specific capture of yeast cells displaying single-chain variable region fragments (scFvs) against p-tau with a wide range of affinities (KD = 0.2 to 60 nM). These results demonstrate that yeast biopanning can robustly capture yeast cells based on phospho-site specific antibody binding, opening doors for facile identification of high-quality monoclonal antibodies.

5

Protein CREATE enables closed-loop design of de novo synthetic protein binders

Lourenco, A. L.; Subramanian, A. M.; Spencer, R. K.; Miao, J.; Anaya, M.; Fu, W.; Chow, E. D.; Thomson, M.

2024-12-22 bioengineering 10.1101/2024.12.20.629847 medRxiv

Top 0.1%

13.1%

Show abstract

Proteins have proven to be useful agents in a variety of fields, from serving as potent therapeutics to enabling complex catalysis for chemical manufacture. However, they remain difficult to design and are instead typically selected for using extensive screens or directed evolution. Recent developments in protein large language models have enabled fast generation of diverse protein sequences in unexplored regions of protein space predicted to fold into varied structures, bind relevant targets, and catalyze novel reactions. Nevertheless, we lack methods to characterize these proteins experimentally at scale and update generative models based on those results. We describe Protein CREATE (Computational Redesign via an Experiment-Augmented Training Engine), an integrated computational and experimental pipeline that incorporates an experimental workflow leveraging next generation sequencing and phage display with single-molecule readouts to collect vast amounts of quantitative binding data for updating protein large language models. We use Protein CREATE to generate and assay thousands of designed binders to IL-7 receptor and insulin receptor with parallel positive and negative selections to identify on-target binders. We discover not only individual novel binders but also features of ligand-receptor binding, including preservation of the IL7R - ligand hydrophobic interface specifically and existence of multiple approaches to contact the insulin receptor. We also demonstrate the importance of structural features, such as the lack of unpaired cysteine residues, toward design fidelity and find computational pre-screening metrics, such as interchain predicted TM scoring (iPTM), while useful, are imperfect predictors as they neither guarantee experimental binding nor rule it out. We use the data collected from Protein CREATE to score designs from the initial generative models. Globally, Protein CREATE will power future closed-loop design-build-test cycles to enable fine-grained design of protein binders.

6

Crowdsourced Protein Design: Lessons From the Adaptyv EGFR Binder Competition

Cotet, T.-S.; Krawczuk, I.; Pacesa, M.; Nickel, L.; Correia, B. E.; Haas, N.; Qamar, A.; Challacombe, C. A.; Kidger, P.; Ferragu, C.; Naka, A.; Castorina, L. V.; Subr, K.; Kluonis, T.; Stam, M. J.; Unal, S. M.; Wood, C. W.; Stocco, F.; Ferruz, N.; Kurumida, Y.; Calia, C. N.; Paesani, F.; Machado, L. d. A.; Belot, E.; Gitter, A.; Campbell, M. J.; Hallee, L.; Adaptyv Competition Organizers,

2025-04-23 bioengineering 10.1101/2025.04.17.648362 medRxiv

Top 0.1%

11.8%

Show abstract

In this report, we summarize and analyze the 2024 Adaptyv protein design competition. Participants used computational and Machine Learning (ML) methods of their choice to design proteins that bind the Epidermal Growth Factor Receptor (EGFR), a key drug target involved in cell growth, differentiation, and cancer development. Over 1,800 designs were submitted across two rounds. Of these, 601 proteins were selected and characterized for expression and binding affinity to EGFR, with competitors both optimizing existing binders (KD = 1.21 nM) and creating de novo binders (KD = 82 nM). All selected designs were experimentally validated using Adaptyvs automated Bio-Layer Interferometry (BLI) pipeline. This competition illustrates the potential of crowdsourcing to drive creativity and innovation in protein design. However, it also exposed key challenges, such as the lack of standardized benchmarks, experimental design targets, and robust computational metrics for method comparison. We anticipate that future competitions will address these gaps and further motivate progress in computational protein design.

7

Engineering Candida boidinii formate dehydrogenase for activity with NMN(H)

Vainstein, S.; Banta, S.

2024-07-22 bioengineering 10.1101/2024.07.17.604001 medRxiv

Top 0.1%

11.7%

Show abstract

Multi-step enzymatic reaction cascades often involve cofactors that serve as electron donors/acceptors in addition to the primary substrates. The co-localization of cascades can lead to cross-talk and competition, which can be unfavorable for the production of a targeted product. Orthogonal pathways allow reactions of interest to operate independently from the metabolic reactions within a cell; non-canonical cofactor analogs have been explored as a means to create these orthogonal pathways. Here, we aimed to engineer the formate dehydrogenase from Candid boidinii (CbFDH) for activity with the non-canonical cofactor nicotinamide adenine mononucleotide (NMN(H)). We used PyRosetta and structural alignment to design mutations that enable CbFDH to use NMN+ for the oxidation of formate. Although the suggested mutations did not result in enhanced activity with NMN+, we found that PyRosetta was able to easily design single mutations that disrupted all enzymatic activity.

8

Computational Design of Soluble CCR8 Analogues with Preserved Antibody Binding

Nguyen, T.; Liu, S.; Li, Y.; Cong, L.; Shek, R.; Lee, T. H.; Yi, L.; Greisen, P.

2025-08-22 bioengineering 10.1101/2025.08.18.670068 medRxiv

Top 0.1%

10.9%

Show abstract

G protein-coupled receptors (GPCRs) represent the largest class of drug targets, yet their membrane-embedded nature poses significant challenges for structural studies and therapeutic development. Here, we report the successful computational design and experimental validation of soluble CCR8 analogues that maintain native antibody binding properties. Using an integrated pipeline combining ProteinMPNN-sol sequence design, and structure-based filtering, we generated 13 CCR8 analogues from 272 initial designs across three N-terminal truncation strategies. Experimental validation confirmed 62% success rate (8/13 designs) with protein yields of 1.19-73.72 mg/L in aqueous buffer, representing a significant improvement over traditional membrane protein production method. Surface plasmon resonance analysis demonstrated that all analogues retained mAb1 binding with dissociation constants ranging from 77-857 nM, comparable to wild-type CCR8 (KD = 190 nM). Despite extensive sequence divergence (10-13% identity with wild-type CCR8), structural integrity was preserved as evidenced by binding affinity maintenance and computational structural validation. This work demonstrates the feasibility of computationally designing functional soluble analogues of challenging membrane proteins, with implications for accelerating drug discovery, antibody development, and structural biology studies. Our approach addresses critical limitations in membrane protein accessibility while preserving native epitope presentation, opening new avenues for therapeutic target characterization and binder discovery.

9

An enhanced yeast display platform demonstrates the binding plasticity under various selection pressures

Zahradnik, J.; Dey, D.; Marciano, S.; Schreiber, G.

2020-12-17 biochemistry 10.1101/2020.12.16.423176 medRxiv

Top 0.1%

10.2%

Show abstract

Yeast surface display is popular in vitro evolution method. Here, we enhanced the method by multiple rounds of DNA and protein engineering, resulting in increased protein stabilities, surface expression, and enhanced fluorescence. The pCTcon2 yeast display vector was rebuild, introducing surface exposure tailored reporters - eUnaG2 and DnbALFA, creating a new platform of C and N terminal fusion vectors. In addition to gains in simplicity, speed, and cost, new applications were included to monitor protein surface exposure and protein retention in the secretion pathway. The enhanced methodologies were applied to investigate de-novo evolution of protein-protein interaction sites. Selecting binding from a mix of 6 protein-libraries towards two targets using high stringency selection led to the isolations of single high-affinity binders to each of the targets, without the need for high complexity libraries. Conversely, low-stringency selection resulted in the creation of many solutions for weak binding, demonstrating the plasticity of weak de-novo interactions.

10

Discovery and characterization of llama VHH targeting the RO form of human CD45

Lupardus, P. J.; Rokkam, D.

2020-09-02 biochemistry 10.1101/2020.09.01.278853 medRxiv

Top 0.1%

10.1%

Show abstract

CD45 is an abundant and highly active cell-surface protein tyrosine phosphatase (PTP) found on cells of hematopoietic origin. CD45 is of particular importance for T-cell function, playing a key role in the activation/inactivation cycle of the T-cell receptor signaling complex. The extracellular domain of CD45 is comprised of an N-terminal mucin-like domain which can be alternatively spliced to a core domain (RO) consisting of four domains with fibronectin 3 domain (FN3)-like topology. The study of CD45 has been hampered by a small set of publicly available antibodies, which we characterized as specific to the N-terminal FN3 domains of CD45 RO. To broaden the human CD45 reagent set, we identified anti-CD45 single domain VHH antibodies from a post-immune llama phage display library. Using a yeast display domain mapping system and affinity measurement we characterized seven unique clonotypes specific for CD45 RO, including binders that target each of the four FN3-like domains. These VHH molecules are important new tools for studying the role of CD45 in T-cell function in vitro and in vivo.

11

Computational Engineering of a Therapeutic Antibody to Inhibit Multiple Mutants of HER2 Without Compromising Inhibition of the Canonical HER2

Peled, S.; Guez-Haddad, J.; Zur Biton, N.; Nimrod, G.; Fischman, S.; Fastman, Y.; Ofran, Y.

2023-07-25 bioinformatics 10.1101/2023.07.21.550003 medRxiv

Top 0.1%

10.0%

Show abstract

Genomic germline and somatic variations may impact drug binding and even lead to resistance. However, designing a different drug for each mutant may not be feasible. In this study, we identified the most common cancer somatic mutations from the Catalogue of Somatic Mutations in Cancer (COSMIC) that occur in structurally characterized binding sites of approved therapeutic antibodies. We found two HER2 mutations, S310Y and S310F, that substantially compromise binding of Pertuzumab, a widely used therapeutics, and lead to drug resistance. To address these mutations, we designed a multi-specific version of Pertuzumab, that retains original function while also bindings these HER2 variants. This new antibody is stable and inhibits HER3 phosphorylation in a cell-based assay for all three variants, suggesting it can inhibit HER2-HER3 dimerization in patients with any of the variants. This study demonstrates how a small number of carefully selected mutations can add new specificities to an existing antibody without compromising its original function, creating a single therapeutic antibody that targets multiple common variants, making a drug that is not personalized yet its activity may be.

12

De novo design of phospho-tyrosine peptide binders

Bauer, M. S.; Zhang, J. Z.; Wu, K.; Lee, G. R.; Coventry, B.; Klupt, K. A.; Shi, J.; Brent, R. I.; Li, X.; Moller, C.; Roullier, N.; Vafeados, D. K.; Kalvet, I.; Skotheim, R. K.; Zhu, S.; Motmaen, A.; Herrmann, L. C.; Sturmfels, P.; Tischer, D.; Altae-Tran, H. R.; Juergens, D.; Krishna, R.; Ahern, W.; Yim, J.; Bera, A. K.; Kang, A.; Joyce, E.; Lu, A.; Stewart, L.; DiMaio, F.; Baker, D.

2025-09-30 bioengineering 10.1101/2025.09.29.678898 medRxiv

Top 0.1%

9.6%

Show abstract

Phosphorylation on tyrosine is a key step in many signaling pathways. Despite recent progress in de novo design of protein binders, there are no current methods for designing binders that recognize phosphorylated proteins and peptides; this is a challenging problem as phosphate groups are highly charged, and phosphorylation often occurs within unstructured regions. Here we introduce RoseTTAFold Diffusion 2 for Molecular Interfaces (RFD2-MI), a deep generative framework for the design of binders for protein, ligand, and covalently modified protein targets. We demonstrate the power and versatility of this method by designing binders for four critical phosphotyrosine sites on three clinically relevant targets: Cluster of Differentiation 3 (CD3{varepsilon}), Epidermal Growth Factor Receptor (EGFR), Insulin Receptor (INSR) and Signal Transducer and Activator of Transcription 5 (STAT5). Experimental characterization shows that the designs bind their phosphotyrosine containing targets with affinities comparable to native binding sites and have negligible binding to non-phosphorylated targets or phosphopeptides with different sequences. X-ray crystal structures of generated binders to CD3{varepsilon} and EGFR are very close to the design models, demonstrating the accuracy of the design approach. A designed binder to an EGFR intracellular region phosphorylated upon EGF activation co-localizes with the receptor following EGF stimulation in single-particle tracking (SPT) experiments, demonstrating pY specific recognition in living cells. RFD2-MI provides a generalizable all-atom diffusion framework for probing and modulating phosphorylation-dependent signaling, and more generally, for developing research tools and targeted therapeutics against post-translationally modified proteins.

13

FLIP2: Expanding Protein Fitness Landscape Benchmarks for Real-World Machine Learning Applications

Didi, K.; Alamdari, S.; Lu, A. X.; Wittmann, B.; Johnston, K. E.; Amini, A. P.; Madani, A. K.; Czeneszew, M.; Dallago, C.; Yang, K. K.

2026-02-24 bioengineering 10.64898/2026.02.23.707496 medRxiv

Top 0.1%

9.5%

Show abstract

Machine learning methods that predict protein fitness from sequence remain sensitive to changes in data distributions, limiting generalization across common conditions encountered in protein engineering. Practically, protein engineers are thus left wondering about the effective utility of ML tools. The FLIP benchmark established protocols for testing generalization under some domain shifts, but it was limited to measurements of thermostability, binding, and viral capsid viability. We introduce FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns. Evaluating a suite of benchmark models across these datasets and suites reveals that simpler models often matched or outperformed fine-tuned protein language models on FLIP2, challenging the utility of existing transfer learning techniques. Provenance for all datasets has been recorded and we redistribute all data CC-BY 4.0 to facilitate continued progress.

14

Yeast MoClo secretion and surface display toolkit 2.0: improvements and applications for analysis of protein-protein interactions and whole-cell biocatalysis

Juric, V.; Erwin, L. G.; O'Riordan, N. M.; Maher, E.; Holmes, J. D.; Young, P. W.

2025-08-19 synthetic biology 10.1101/2025.08.19.671047 medRxiv

Top 0.1%

8.9%

Show abstract

Saccharomyces cerevisiae is an invaluable model organism for both fundamental biological research and biotechnological applications including recombinant protein production as well as protein and metabolic engineering. We previously developed a modular cloning (MoClo) based toolkit for S. cerevisiae that facilitates rapid optimization of signal peptides and anchor proteins for efficient secretion and/or surface display of heterologous proteins of interest. Here we describe further improvements and applications of this yeast secretion and display (YSD) toolkit. New parts encoding anchor proteins based on N-terminal fusion to a truncated Aga1 and C-terminal fusion to Aga2, each with three possible epitope tag options, are described. We also added parts that facilitate high throughput detection of secreted proteins of interest through GFP fluorescence complementation and parts encoding "secretion boosting" yeast proteins, whose overexpression has previously been reported to enhance secretion of heterologous proteins. In addition, two surface display applications of the toolkit are showcased. We demonstrate that yeast surface display of an anti-GFP nanobody allows cost-effective evaluation of the interactions of GFP-tagged proteins of interest, either by flow cytometry or yeast-based co-immunoprecipitation. In addition, using yeast cells as whole-cell catalysts, we show that co-display of the poly(ethylene terephthalate) (PET) degrading enzyme leaf-branch compost cutinase with hydrophobin1 enhances the breakdown of PET plastic, while triple co-display of these proteins with MHETase causes complete conversion of the intermediary monohydroxyLethyl-terephthalate (MHET) to terephthalic acid. The diverse applications described herein demonstrate the broad applications of the updated MoClo YSD toolkit 2.0 in both synthetic biology and other research fields.

15

Experimental Evaluation of AI-Driven Protein Design Risks Using Safe Biological Proxies

Ikonomova, S. P.; Wittmann, B. J.; Piorino, F.; Ross, D. J.; Schaffter, S. W.; Vasilyeva, O. B.; Horvitz, E.; Diggans, J.; Strychalski, E. A.; Lin-Gibson, S.; Taghon, G. J.

2025-05-16 synthetic biology 10.1101/2025.05.15.654077 medRxiv

Top 0.1%

8.0%

Show abstract

Advances in machine learning are providing leaps forward for beneficial applications of protein engineering, while also raising concerns about biosecurity. Recently, Wittmann et al. described an in silico pipeline of generative AI tools to reformulate sequences of concern (SOCs) as synthetic homologs that may evade detection by biosecurity screening software (BSS) used by nucleic acid synthesis providers. Experimental testing of synthetic homologs is required to ascertain the true severity of this vulnerability. We present a generalizable framework to assess biosecurity risk consisting of testing, evaluation, validation, and verification (TEVV) of AI-assisted protein design (AIPD). We determine that common AIPD models in use at the time this study was initiated (early 2024) are not yet powerful enough to reliably rewrite the sequence of a given protein, while both maintaining activity and evading detection by BSS.

16

BLOSUM Is All You Learn - Generative Antibody Models Reflect Evolutionary Priors

Ucar, T.; Sormanni, P.

2025-10-27 bioengineering 10.1101/2025.10.26.684652 medRxiv

Top 0.1%

7.9%

Show abstract

Generative models have emerged as powerful tools for antibody sequence design, with recent studies demonstrating that log-likelihood scores from these models can correlate with binding affinity and potentially serve as effective ranking metrics. This raises a fundamental question: why should log-likelihood scores from generative models correlate with binding affinity? In this work, we investigate the biochemical basis of these model-derived log-likelihoods by comparing them with classical evolutionary similarity metrics. We find that BLOSUM similarity scores between designed and parental antibody sequences correlate strongly with measured binding affinity--on par with the predictive performance of a state-of-the-art diffusion-based generative model. Moreover, these BLOSUM scores also align closely with log-likelihoods from multiple generative models, suggesting that such models may be implicitly learning evolutionary priors encoded in substitution matrices. When computed with respect to a known binder, both BLO-SUM scores and log-likelihoods act as approximate measures of sequence distance from that reference. As this distance increases, the likelihood of a candidate being a binder decreases, explaining the observed correlation between these scores and binding affinity. In contrast, similarity scores based on position weight matrices (PWMs) and position-specific scoring matrices (PSSMs), which do not rely on knowledge of the parental sequence, show weaker and less consistent alignment with binding affinity, with performance depending on the background sequence data. Additionally, using consensus sequences in place of parental sequences to compute BLOSUM scores largely eliminates the observed correlation with affinity, underscoring the context-specific nature of the correlations. These findings highlight the potential of interpretable, evolution-inspired metrics to complement generative modeling in anti-body design, offering insights into both model behavior and biological relevance.

17

Development of high-affinity, single-domain protein binders for neutralizing household allergens

Zhao, E. M.; Zhang, D. K.; Yue, Q.; He, Q.; Li, Y.; Qu, Z.; Zhao, G.; Andreissen, N.

2025-08-03 bioengineering 10.1101/2025.08.03.668213 medRxiv

Top 0.1%

7.8%

Show abstract

Feline, canine, and human dust mite allergens drive significant IgE-mediated allergies in the household environment. Here, we describe the discovery and characterization of novel protein binders, termed AVA, which efficiently bind and disrupt major allergens from cats (Fel d 1), dogs (Can f 1, Can f 2), and dust mites (Der p 1, Der p 2). Leveraging camelid single-domain variable-heavy chain (VHH) antibodies, we identified and optimized VHH sequences with robust affinity towards the individual allergens. AVA disrupted critical allergen substructures (Fel d 1) and catalytic functions (Der p 1) in molecular dynamics simulations and in in vitro assays, respectively. Paired with robust thermostability and non-toxic in vitro viability profiles, AVA represents a promising approach for downstream household allergen mitigation.

18

BoltzGen: Toward Universal Binder Design

Stark, H.; Faltings, F.; Choi, M.; Xie, Y.; Hur, E.; O'Donnell, T. J.; Bushuiev, A.; Ucar, T.; Passaro, S.; Mao, W.; Reveiz, M.; Bushuiev, R.; Portnoi, T.; Pluskal, T.; Sivic, J.; Kreis, K.; Vahdat, A.; Ray, S.; Goldstein, J. T.; Savinov, A.; Hambalek, J. A.; Gupta, A.; Taquiri-Diaz, D. A.; Zhang, Y.; Snyder, S. J.; Hatstat, A. K.; Arada, A.; Kim, N. H.; Fan, H.; Tackie-Yarboi, E.; Boselli, D.; Schnaider, L.; Liu, C. C.; Li, G.-W.; Hnisz, D.; Sabatini, D. M.; DeGrado, W. F.; Wohlwend, J.; Corso, G.; Barzilay, R.; Jaakkola, T.

2026-06-16 bioengineering 10.1101/2025.11.20.689494 medRxiv

Top 0.1%

7.7%

Show abstract

We introduce BoltzGen, an all-atom generative model for designing proteins and peptides across all modalities to bind a wide range of biomolecular targets. BoltzGen builds strong structural reasoning capabilities about target-binder interactions into its generative design process. This is achieved by unifying design and structure prediction, resulting in a single model that also reaches state-of-the-art folding performance. BoltzGens generation process can be controlled with a flexible design specification language over covalent bonds, structure constraints, binding sites, and more. We experimentally validate these capabilities in eight diverse design campaigns with functional and affinity readouts across 26 targets. In our experiments, binder modalities span from nanobodies to disulfide-bonded peptides, and targets from disordered proteins to small molecules. In particular, we identify nanobody binders for novel targets with low similarity to proteins with already known bound structures. We release model weights, data, and both inference and training code at: https://github.com/HannesStark/boltzgen.

19

Improving antibody affinity using laboratory data with language model guided design

Krause, B.; Subramanian, S.; Yuan, T.; Yang, M.; Sato, A.; Naik, N.

2023-11-03 synthetic biology 10.1101/2023.09.13.557505 medRxiv

Top 0.1%

6.9%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWProtein design involves navigating vast sequence spaces to discover sequences with desired traits. Language models (LMs) pretrained on universal protein datasets have shown potential to make this search space tractable. However, LMs trained solely on natural sequences have limitations in creating proteins with novel functions. In this work, we used a combination of methods to finetune pretrained LMs on laboratory data collected in an anti-CD40L single domain antibody library campaign to develop an ensemble scoring function to model the fitness landscape and guide the design of new antibodies. Laboratory experiments confirmed improved CD40L affinity in the designed antibodies. Notably, the designs improved the affinities of four antibodies, originally ranging from 1 nanomolar to 100 picomolar, all to below 25 picomolar, approaching the limit of detection. This work is a promising step towards realizing the potential of LMs to leverage laboratory data to develop improved treatments for diseases.

20

Computational redesign of a thermostable T7 RNA polymerase

Baumer, Z. T.; Whitehead, T. A.

2025-11-12 biochemistry 10.1101/2025.11.12.688101 medRxiv

Top 0.1%

6.8%

Show abstract

T7 RNA polymerase (T7 RNAP) is a foundational enzyme for biotechnology, but its utility for many potential applications is limited by low thermal stability of 43-44{degrees}C. While stabilized variants exist, the most stable commercial version has a proprietary sequence. In this work we developed a highly stable T7 RNAP using structure-based computational design. We combined mutations from previous stabilized variants (M5, M8, V7abcd) with new mutations identified by PROSS. These mutations were filtered using data-driven heuristics to preserve function. Our final design, T7T+, contains 30 point mutations from the original T7 RNAP and demonstrates a functional stability (T50) of 54.9{degrees}C in a thermal challenge assay, which is 2.4{degrees}C higher than the most stable, published open-source variant to date. Circular dichroism spectroscopy showed an apparent melting temperature of 53.8{degrees}C. T7T+ retains 59% of wild-type activity at 37{degrees}C. 16 of the 18 tested protein designs had higher stability against thermal challenge compared with the genetic background, attesting to the high success rates of existing non deep learning computational methods for the design of stable, functional proteins. A plasmid encoding T7T+ has been deposited in AddGene and is freely available for non-commercial use.